AITopics

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Neural Information Processing SystemsJun-13-2026, 15:57:12 GMT

Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures

Mixture-of-Experts (MoE) architecture offers enhanced efficiency for Large Language Models (LLMs) with modularized computation, yet its inherent sparsity poses significant hardware deployment challenges, including memory locality issues, communication overhead, and inefficient computing resource utilization. Inspired by the modular organization of the human brain, we propose $\texttt{Mozart}$, a novel algorithm-hardware co-design framework tailored for efficient training of MoE-based LLMs on 3.5D wafer-scale chiplet architectures. On the algorithm side, $\texttt{Mozart}$ exploits the inherent modularity of chiplets and introduces: ($1$) an expert allocation strategy that enables efficient on-package all-to-all communication, and ($2$) a fine-grained scheduling mechanism that improves communication-computation overlap through streaming tokens and experts. On the architecture side, $\texttt{Mozart}$ adaptively co-locates heterogeneous modules on specialized chiplets with a 2.5D NoP-Tree topology and hierarchical memory structure. Evaluation across three popular MoE models demonstrates significant efficiency gains, enabling more effective parallelization and resource utilization for large-scale modularized MoE-LLMs.

artificial intelligence, large language model, natural language, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsFeb-11-2026, 11:58:06 GMT

10493aa88605cad5ab4752b04a63d172-Paper.pdf

agent, fair-efficient reward, fairness, (13 more...)

Country:

North America > Canada (0.04)
Asia > China (0.04)

Industry: Leisure & Entertainment (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 11:57:51 GMT

10493aa88605cad5ab4752b04a63d172-AuthorFeedback.pdf

fair-efficient reward, hierarchy, resource utilization, (15 more...)

Technology: Information Technology > Artificial Intelligence (0.51)

Wijethilake, Kasun Eranda, Mahmood, Adnan, Sheng, Quan Z.

FedCLF -- Towards Efficient Participant Selection for Federated Learning in Heterogeneous IoV Networks

arXiv.org Artificial IntelligenceOct-30-2025

Federated Learning (FL) is a distributed machine learning technique that preserves data privacy by sharing only the trained parameters instead of the client data. This makes FL ideal for highly dynamic, heterogeneous, and time-critical applications, in particular, the Internet of Vehicles (IoV) networks. However, FL encounters considerable challenges in such networks owing to the high data and device heterogeneity. To address these challenges, we propose FedCLF, i.e., FL with Calibrated Loss and Feedback control, which introduces calibrated loss as a utility in the participant selection process and a feedback control mechanism to dynamically adjust the sampling frequency of the clients. The envisaged approach (a) enhances the overall model accuracy in case of highly heterogeneous data and (b) optimizes the resource utilization for resource constrained IoV networks, thereby leading to increased efficiency in the FL process. We evaluated FedCLF vis-à-vis baseline models, i.e., FedAvg, Newt, and Oort, using CIFAR-10 dataset with varying data heterogeneity. Our results depict that FedCLF significantly outperforms the baseline models by up to a 16% improvement in high data heterogeneity-related scenarios with improved efficiency via reduced sampling frequency.

artificial intelligence, fedclf, machine learning, (14 more...)

doi: 10.1007/978-981-96-0814-0_15

2509.25233

Country: North America (0.46)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceOct-15-2025

PubSub-VFL: Towards Efficient Two-Party Split Learning in Heterogeneous Environments via Publisher/Subscriber Architecture

Liu, Yi, Liu, Yang, Zheng, Leqian, Hong, Jue, Shi, Junjie, Yang, Qingyou, Wu, Ye, Wang, Cong

With the rapid advancement of the digital economy, data collaboration between organizations has become a well-established business model, driving the growth of various industries. However, privacy concerns make direct data sharing impractical. To address this, Two-Party Split Learning (a.k.a. Vertical Federated Learning (VFL)) has emerged as a promising solution for secure collaborative learning. Despite its advantages, this architecture still suffers from low computational resource utilization and training efficiency. Specifically, its synchronous dependency design increases training latency, while resource and data heterogeneity among participants further hinder efficient computation. To overcome these challenges, we propose PubSub-VFL, a novel VFL paradigm with a Publisher/Subscriber architecture optimized for two-party collaborative learning with high computational efficiency. PubSub-VFL leverages the decoupling capabilities of the Pub/Sub architecture and the data parallelism of the parameter server architecture to design a hierarchical asynchronous mechanism, reducing training latency and improving system efficiency. Additionally, to mitigate the training imbalance caused by resource and data heterogeneity, we formalize an optimization problem based on participants' system profiles, enabling the selection of optimal hyperparameters while preserving privacy. We conduct a theoretical analysis to demonstrate that PubSub-VFL achieves stable convergence and is compatible with security protocols such as differential privacy. Extensive case studies on five benchmark datasets further validate its effectiveness, showing that, compared to state-of-the-art baselines, PubSub-VFL not only accelerates training by $2 \sim 7\times$ without compromising accuracy, but also achieves a computational resource utilization rate of up to 91.07%.

data mining, efficiency, machine learning, (20 more...)

2510.12494

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

arXiv.org Artificial IntelligenceOct-14-2025

Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony

Lu, Han, Liu, Zichen, Xiong, Shaopan, He, Yancheng, Gao, Wei, Wu, Yanan, Wang, Weixun, Liu, Jiashun, Li, Yang, Zhao, Haizhou, Huang, Ju, Yang, Siran, Li, Xiaoyang, Luo, Yijia, Liu, Zihe, Pan, Ling, Yan, Junchi, Wang, Wei, Su, Wenbo, Wang, Jiamang, Qu, Lin, Zheng, Bo

Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low resource utilization and limited scalability. We present ROLL Flash, a system that extends ROLL with native support for asynchronous RL post-training. ROLL Flash is built upon two core design principles: fine-grained parallelism and rollout-train decoupling. Guided by these principles, ROLL Flash provides flexible programming interfaces that enable a fully asynchronous training architecture and support efficient rollout mechanisms, including queue scheduling and environment-level asynchronous execution. Through comprehensive theoretical analysis and extensive experiments, we demonstrate that ROLL Flash significantly improves resource utilization and scalability over synchronous RL post-training. ROLL Flash achieves up to 2.24x speedup on RLVR tasks and 2.72x on agentic tasks, using the same GPU budget as synchronous baselines. Furthermore, we implement several popular off-policy algorithms and verify that asynchronous training can achieve performance on par with synchronous training.

large language model, machine learning, reinforcement learning, (19 more...)

2510.11345

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

arXiv.org Artificial IntelligenceOct-8-2025

Artificial Intelligence for Cost-Aware Resource Prediction in Big Data Pipelines

Goyal, Harshit

Efficient resource allocation is a key challenge in modern cloud computing. Over-provisioning leads to unnecessary costs, while under-provisioning risks performance degradation and SLA violations. This work presents an artificial intelligence approach to predict resource utilization in big data pipelines using Random Forest regression. We preprocess the Google Borg cluster traces to clean, transform, and extract relevant features (CPU, memory, usage distributions). The model achieves high predictive accuracy (R Square = 0.99, MAE = 0.0048, RMSE = 0.137), capturing non-linear relationships between workload characteristics and resource utilization. Error analysis reveals impressive performance on small-to-medium jobs, with higher variance in rare large-scale jobs. These results demonstrate the potential of AI-driven prediction for cost-aware autoscaling in cloud environments, reducing unnecessary provisioning while safeguarding service quality.

artificial intelligence, data mining, machine learning, (20 more...)

2510.05127

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.62)

Neural Information Processing SystemsOct-2-2025, 02:51:53 GMT

Learning Fairness in Multi-Agent Systems

Fairness is essential for human society, contributing to stability and productivity. Similarly, fairness is also the key for many multi-agent systems.

agent, artificial intelligence, fairness, (14 more...)

Industry: Leisure & Entertainment (0.70)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Neural Information Processing SystemsOct-2-2025, 02:51:38 GMT

10493aa88605cad5ab4752b04a63d172-AuthorFeedback.pdf

We gratefully appreciate the efforts made by all the reviewers. Hughes et al. [2018] extend the inequity aversion model and define a shaped reward These works aim to improve cooperation but cannot guarantee fairness. We compare against Hughes et al. [2018], More details will be included in the final version. To verify the effectiveness of the hierarchy, we use the hierarchy with other baselines in job scheduling. That demonstrates the effect of the hierarchy. The intuition of the fair-efficient reward is to maximize the resource utilization while punish the agent's utility deviation The main hyperparameters are contained in the Appendix, we will make a further supplement in the final version.

artificial intelligence, fair-efficient reward, hierarchy, (16 more...)

Technology: Information Technology > Artificial Intelligence (0.51)